InPainting par réseau¶

Context Encoders: Feature Learning by Inpainting, by Deepak Pathak, et al.¶

A review by Quentin LEROY and Bastien PONCHON¶

Experiments and weight randomization¶

The following notebook contains some of our eperimentations on the context encoder presented by Deepak Pathak et al. The paper as well as pre-trained networks and information on how to use them can be found from the context encoder GitHub: https://people.eecs.berkeley.edu/~pathak/context_encoder/

In [1]:
require 'image'
require 'nn'
torch.setdefaulttensortype('torch.FloatTensor')

First let us load the pre-trained networks from the paper. The following cell requires the models to have been downloaded and extracted from the author webpage (http://www.cs.berkeley.edu/~pathak/context_encoder/resources/inpaintCenterModels.tar.gz) by running the following shell command from inside the context-encoder folder : bash ./models/scripts/download_inpaintCenter_models.sh

In [2]:
--load pre-trained network trained on paris_street_view images
netParis = torch.load('./context-encoder/models/inpaintCenter/paris_inpaintCenter.t7')
netParis:apply(function(m) if m.weight then 
    m.gradWeight = m.weight:clone():zero(); 
    m.gradBias = m.bias:clone():zero(); end end)
netParis:evaluate()
netParis:float()

--load pre-trained network trained on imagenet images
netImagenet = torch.load('./context-encoder/models/inpaintCenter/imagenet_inpaintCenter.t7')
netImagenet:apply(function(m) if m.weight then 
    m.gradWeight = m.weight:clone():zero(); 
    m.gradBias = m.bias:clone():zero(); end end)
netImagenet:evaluate()
netImagenet:float()
In [3]:
print(netParis)
Out[3]:
nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> output]
  (1): nn.Sequential {
    [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> output]
    (1): nn.SpatialConvolution(3 -> 64, 4x4, 2,2, 1,1)
    (2): nn.LeakyReLU(0.2)
    (3): nn.SpatialConvolution(64 -> 64, 4x4, 2,2, 1,1)
    (4): nn.SpatialBatchNormalization (4D) (64)
    (5): nn.LeakyReLU(0.2)
    (6): nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1)
    (7): nn.SpatialBatchNormalization (4D) (128)
    (8): nn.LeakyReLU(0.2)
    (9): nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1)
    (10): nn.SpatialBatchNormalization (4D) (256)
    (11): nn.LeakyReLU(0.2)
    (12): nn.SpatialConvolution(256 -> 512, 4x4, 2,2, 1,1)
    (13): nn.SpatialBatchNormalization (4D) (512)
    (14): nn.LeakyReLU(0.2)
    (15): nn.SpatialConvolution(512 -> 4000, 4x4)
  }
  (2): nn.SpatialBatchNormalization (4D) (4000)
  (3): nn.LeakyReLU(0.2)
  (4): nn.SpatialFullConvolution(4000 -> 512, 4x4)
  (5): nn.SpatialBatchNormalization (4D) (512)
  (6): nn.ReLU
  (7): nn.SpatialFullConvolution(512 -> 256, 4x4, 2,2, 1,1)
  (8): nn.SpatialBatchNormalization (4D) (256)
  (9): nn.ReLU
  (10): nn.SpatialFullConvolution(256 -> 128, 4x4, 2,2, 1,1)
  (11): nn.SpatialBatchNormalization (4D) (128)
  (12): nn.ReLU
  (13): nn.SpatialFullConvolution(128 -> 64, 4x4, 2,2, 1,1)
  (14): nn.SpatialBatchNormalization (4D) (64)
  (15): nn.ReLU
  (16): nn.SpatialFullConvolution(64 -> 3, 4x4, 2,2, 1,1)
  (17): nn.Tanh
}
{
  train : false
  output : FloatTensor - empty
  gradInput : FloatTensor - empty
  modules : 
    {
      1 : 
        nn.Sequential {
          [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> output]
          (1): nn.SpatialConvolution(3 -> 64, 4x4, 2,2, 1,1)
          (2): nn.LeakyReLU(0.2)
          (3): nn.SpatialConvolution(64 -> 64, 4x4, 2,2, 1,1)
          (4): nn.SpatialBatchNormalization (4D) (64)
          (5): nn.LeakyReLU(0.2)
          (6): nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1)
          (7): nn.SpatialBatchNormalization (4D) (128)
          (8): nn.LeakyReLU(0.2)
          (9): nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1)
          (10): nn.SpatialBatchNormalization (4D) (256)
          (11): nn.LeakyReLU(0.2)
          (12): nn.SpatialConvolution(256 -> 512, 4x4, 2,2, 1,1)
          (13): nn.SpatialBatchNormalization (4D) (512)
          (14): nn.LeakyReLU(0.2)
          (15): nn.SpatialConvolution(512 -> 4000, 4x4)
        }
        {
          train : false
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          modules : 
            {
              1 : 
                nn.SpatialConvolution(3 -> 64, 4x4, 2,2, 1,1)
                {
                  padW : 1
                  nInputPlane : 3
                  output : FloatTensor - empty
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  gradBias : FloatTensor - size: 64
                  dW : 2
                  nOutputPlane : 64
                  padH : 1
     
Out[3]:
             kH : 4
                  finput : FloatTensor - empty
                  weight : FloatTensor - size: 64x3x4x4
                  train : false
                  gradWeight : FloatTensor - size: 64x3x4x4
                  kW : 4
                  dH : 2
                  bias : FloatTensor - size: 64
                  fgradInput : FloatTensor - empty
                }
              2 : 
                nn.LeakyReLU(0.2)
                {
                  inplace : true
                  train : false
                  negval : 0.2
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  output : FloatTensor - empty
                }
              3 : 
                nn.SpatialConvolution(64 -> 64, 4x4, 2,2, 1,1)
                {
                  padW : 1
                  nInputPlane : 64
                  output : FloatTensor - empty
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  gradBias : FloatTensor - size: 64
                  dW : 2
                  nOutputPlane : 64
                  padH : 1
                  kH : 4
                  finput : FloatTensor - empty
                  weight : FloatTensor - size: 64x64x4x4
                  train : false
                  gradWeight : FloatTensor - size: 64x64x4x4
                  kW : 4
                  dH : 2
                  bias : FloatTensor - size: 64
                  fgradInput : FloatTensor - empty
                }
              4 : 
                nn.SpatialBatchNormalization (4D) (64)
                {
                  gradBias : FloatTensor - size: 64
                  _type : torch.FloatTensor
                  bias : FloatTensor - size: 64
                  gradInput : FloatTensor - empty
                  gradWeight : FloatTensor - size: 64
                  running_var : FloatTensor - size: 64
                  momentum : 0.1
                  eps : 1e-05
                  weight : FloatTensor - size: 64
                  train : false
                  affine : true
                  running_mean : FloatTensor - size: 64
                  output : FloatTensor - empty
                  save_std : FloatTensor - size: 64
                  save_mean : FloatTensor - size: 64
                }
              5 : 
                nn.LeakyReLU(0.2)
                {
                  inplace : true
                  train : false
                  negval : 0.2
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  output : FloatTensor - empty
                }
              6 : 
                nn.SpatialConvolution(64 -> 128, 4x4, 2,2, 1,1)
                {
                  padW : 1
                  nInputPlane : 64
                  output : FloatTensor - empty
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  gradBias : FloatTensor - size: 128
                  dW : 2
                  nOutputPlane : 128
                  padH : 1
                  kH : 4
                  finput : FloatTensor - empty
                  weight : FloatTensor - size: 128x64x4x4
                  train : false
                  gradWeight : FloatTensor - size: 128x64x4x4
                  kW : 4
                  dH : 2
                  bias : FloatTensor - size: 128
                  fgradInput : FloatTensor - empty
                }
              7 : 
                nn.SpatialBatchNormalization (4D) (128)
                {
                  gradBias : FloatTensor - size: 128
                  _type : torch.FloatTensor
                  bias : FloatTensor - size: 128
                  gradInput : FloatTensor - empty
                  gradWeight : FloatTensor - size: 128
                  running_var : FloatTensor - size: 128
                  momentum : 0.1
                  eps : 1e-05
                  weight : FloatTensor - size: 128
                  train : false
                  affine : true
                
Out[3]:
  running_mean : FloatTensor - size: 128
                  output : FloatTensor - empty
                  save_std : FloatTensor - size: 128
                  save_mean : FloatTensor - size: 128
                }
              8 : 
                nn.LeakyReLU(0.2)
                {
                  inplace : true
                  train : false
                  negval : 0.2
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  output : FloatTensor - empty
                }
              9 : 
                nn.SpatialConvolution(128 -> 256, 4x4, 2,2, 1,1)
                {
                  padW : 1
                  nInputPlane : 128
                  output : FloatTensor - empty
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  gradBias : FloatTensor - size: 256
                  dW : 2
                  nOutputPlane : 256
                  padH : 1
                  kH : 4
                  finput : FloatTensor - empty
                  weight : FloatTensor - size: 256x128x4x4
                  train : false
                  gradWeight : FloatTensor - size: 256x128x4x4
                  kW : 4
                  dH : 2
                  bias : FloatTensor - size: 256
                  fgradInput : FloatTensor - empty
                }
              10 : 
                nn.SpatialBatchNormalization (4D) (256)
                {
                  gradBias : FloatTensor - size: 256
                  _type : torch.FloatTensor
                  bias : FloatTensor - size: 256
                  gradInput : FloatTensor - empty
                  gradWeight : FloatTensor - size: 256
                  running_var : FloatTensor - size: 256
                  momentum : 0.1
                  eps : 1e-05
                  weight : FloatTensor - size: 256
                  train : false
                  affine : true
                  running_mean : FloatTensor - size: 256
                  output : FloatTensor - empty
                  save_std : FloatTensor - size: 256
                  save_mean : FloatTensor - size: 256
                }
              11 : 
                nn.LeakyReLU(0.2)
                {
                  inplace : true
                  train : false
                  negval : 0.2
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  output : FloatTensor - empty
                }
              12 : 
                nn.SpatialConvolution(256 -> 512, 4x4, 2,2, 1,1)
                {
                  padW : 1
                  nInputPlane : 256
                  output : FloatTensor - empty
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  gradBias : FloatTensor - size: 512
                  dW : 2
                  nOutputPlane : 512
                  padH : 1
                  kH : 4
                  finput : FloatTensor - empty
                  weight : FloatTensor - size: 512x256x4x4
                  train : false
                  gradWeight : FloatTensor - size: 512x256x4x4
                  kW : 4
                  dH : 2
                  bias : FloatTensor - size: 512
                  fgradInput : FloatTensor - empty
                }
              13 : 
                nn.SpatialBatchNormalization (4D) (512)
                {
                  gradBias : FloatTensor - size: 512
                  _type : torch.FloatTensor
                  bias : FloatTensor - size: 512
                  gradInput : FloatTensor - empty
                  gradWeight : FloatTensor - size: 512
                  running_var : FloatTensor - size: 512
                  momentum : 0.1
                  eps : 1e-05
                  weight : FloatTensor - size: 512
                  train : false
                  affine : true
                  running_mean : FloatTensor - size: 512
                  output : FloatTensor - empty
                  save_std : FloatTensor - size: 512
 
Out[3]:
                 save_mean : FloatTensor - size: 512
                }
              14 : 
                nn.LeakyReLU(0.2)
                {
                  inplace : true
                  train : false
                  negval : 0.2
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  output : FloatTensor - empty
                }
              15 : 
                nn.SpatialConvolution(512 -> 4000, 4x4)
                {
                  padW : 0
                  nInputPlane : 512
                  output : FloatTensor - empty
                  gradInput : FloatTensor - empty
                  _type : torch.FloatTensor
                  gradBias : FloatTensor - size: 4000
                  dW : 1
                  nOutputPlane : 4000
                  padH : 0
                  kH : 4
                  finput : FloatTensor - empty
                  weight : FloatTensor - size: 4000x512x4x4
                  train : false
                  gradWeight : FloatTensor - size: 4000x512x4x4
                  kW : 4
                  dH : 1
                  bias : FloatTensor - size: 4000
                  fgradInput : FloatTensor - empty
                }
            }
          _type : torch.FloatTensor
        }
      2 : 
        nn.SpatialBatchNormalization (4D) (4000)
        {
          gradBias : FloatTensor - size: 4000
          _type : torch.FloatTensor
          bias : FloatTensor - size: 4000
          gradInput : FloatTensor - empty
          gradWeight : FloatTensor - size: 4000
          running_var : FloatTensor - size: 4000
          momentum : 0.1
          eps : 1e-05
          weight : FloatTensor - size: 4000
          save_mean : FloatTensor - size: 4000
          affine : true
          running_mean : FloatTensor - size: 4000
          output : FloatTensor - empty
          save_std : FloatTensor - size: 4000
          train : false
        }
      3 : 
        nn.LeakyReLU(0.2)
        {
          inplace : true
          train : false
          negval : 0.2
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          output : FloatTensor - empty
        }
      4 : 
        nn.SpatialFullConvolution(4000 -> 512, 4x4)
        {
          padW : 0
          nInputPlane : 4000
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          dH : 1
          adjH : 0
          nOutputPlane : 512
          bias : FloatTensor - size: 512
          kH : 4
          adjW : 0
          finput : FloatTensor - empty
          gradBias : FloatTensor - size: 512
          weight : FloatTensor - size: 4000x512x4x4
          train : false
          gradWeight : FloatTensor - size: 4000x512x4x4
          kW : 4
          padH : 0
          fgradInput : FloatTensor - empty
          dW : 1
        }
      5 : 
        nn.SpatialBatchNormalization (4D) (512)
        {
          gradBias : FloatTensor - size: 512
          _type : torch.FloatTensor
          bias : FloatTensor - size: 512
          gradInput : FloatTensor - empty
          gradWeight : FloatTensor - size: 512
          running_var : FloatTensor - size: 512
          momentum : 0.1
          eps : 1e-05
          weight : FloatTensor - size: 512
          save_mean : FloatTensor - size: 512
          affine : true
          running_mean : FloatTensor - size: 512
          output : FloatTensor - empty
          save_std : FloatTensor - size: 512
          train : false
        }
      6 : 
        nn.ReLU
        {
          inplace : true
          threshold : 0
          val : 0
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          train : false
        }
      7 : 
        nn.SpatialFullConvolution(512 -> 256, 4x4, 2,2, 1,1)
        {
          padW : 1
          nInputPlane : 512
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          dH : 2
          adjH
Out[3]:
 : 0
          nOutputPlane : 256
          bias : FloatTensor - size: 256
          kH : 4
          adjW : 0
          finput : FloatTensor - empty
          gradBias : FloatTensor - size: 256
          weight : FloatTensor - size: 512x256x4x4
          train : false
          gradWeight : FloatTensor - size: 512x256x4x4
          kW : 4
          padH : 1
          fgradInput : FloatTensor - empty
          dW : 2
        }
      8 : 
        nn.SpatialBatchNormalization (4D) (256)
        {
          gradBias : FloatTensor - size: 256
          _type : torch.FloatTensor
          bias : FloatTensor - size: 256
          gradInput : FloatTensor - empty
          gradWeight : FloatTensor - size: 256
          running_var : FloatTensor - size: 256
          momentum : 0.1
          eps : 1e-05
          weight : FloatTensor - size: 256
          save_mean : FloatTensor - size: 256
          affine : true
          running_mean : FloatTensor - size: 256
          output : FloatTensor - empty
          save_std : FloatTensor - size: 256
          train : false
        }
      9 : 
        nn.ReLU
        {
          inplace : true
          threshold : 0
          val : 0
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          train : false
        }
      10 : 
        nn.SpatialFullConvolution(256 -> 128, 4x4, 2,2, 1,1)
        {
          padW : 1
          nInputPlane : 256
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          dH : 2
          adjH : 0
          nOutputPlane : 128
          bias : FloatTensor - size: 128
          kH : 4
          adjW : 0
          finput : FloatTensor - empty
          gradBias : FloatTensor - size: 128
          weight : FloatTensor - size: 256x128x4x4
          train : false
          gradWeight : FloatTensor - size: 256x128x4x4
          kW : 4
          padH : 1
          fgradInput : FloatTensor - empty
          dW : 2
        }
      11 : 
        nn.SpatialBatchNormalization (4D) (128)
        {
          gradBias : FloatTensor - size: 128
          _type : torch.FloatTensor
          bias : FloatTensor - size: 128
          gradInput : FloatTensor - empty
          gradWeight : FloatTensor - size: 128
          running_var : FloatTensor - size: 128
          momentum : 0.1
          eps : 1e-05
          weight : FloatTensor - size: 128
          save_mean : FloatTensor - size: 128
          affine : true
          running_mean : FloatTensor - size: 128
          output : FloatTensor - empty
          save_std : FloatTensor - size: 128
          train : false
        }
      12 : 
        nn.ReLU
        {
          inplace : true
          threshold : 0
          val : 0
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          train : false
        }
      13 : 
        nn.SpatialFullConvolution(128 -> 64, 4x4, 2,2, 1,1)
        {
          padW : 1
          nInputPlane : 128
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          dH : 2
          adjH : 0
          nOutputPlane : 64
          bias : FloatTensor - size: 64
          kH : 4
          adjW : 0
          finput : FloatTensor - empty
          gradBias : FloatTensor - size: 64
          weight : FloatTensor - size: 128x64x4x4
          train : false
          gradWeight : FloatTensor - size: 128x64x4x4
          kW : 4
          padH : 1
          fgradInput : FloatTensor - empty
          dW : 2
        }
      14 : 
        nn.SpatialBatchNormalization (4D) (64)
        {
          gradBias : FloatTensor - size: 64
          _type : torch.FloatTensor
          bias : FloatTensor - size: 64
          gradInput : FloatTensor - empty
          gradWeight : FloatTensor - size: 64
          running_var : FloatTensor - size: 64
          momentum : 0.1
          eps : 1e-05
Out[3]:
          weight : FloatTensor - size: 64
          save_mean : FloatTensor - size: 64
          affine : true
          running_mean : FloatTensor - size: 64
          output : FloatTensor - empty
          save_std : FloatTensor - size: 64
          train : false
        }
      15 : 
        nn.ReLU
        {
          inplace : true
          threshold : 0
          val : 0
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          train : false
        }
      16 : 
        nn.SpatialFullConvolution(64 -> 3, 4x4, 2,2, 1,1)
        {
          padW : 1
          nInputPlane : 64
          output : FloatTensor - empty
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          dH : 2
          adjH : 0
          nOutputPlane : 3
          bias : FloatTensor - size: 3
          kH : 4
          adjW : 0
          finput : FloatTensor - empty
          gradBias : FloatTensor - size: 3
          weight : FloatTensor - size: 64x3x4x4
          train : false
          gradWeight : FloatTensor - size: 64x3x4x4
          kW : 4
          padH : 1
          fgradInput : FloatTensor - empty
          dW : 2
        }
      17 : 
        nn.Tanh
        {
          gradInput : FloatTensor - empty
          _type : torch.FloatTensor
          train : false
          output : FloatTensor - empty
        }
    }
  _type : torch.FloatTensor
}

As we can see, the context encoder is composed of two sub-networks: a 5 hidden layers encoder (corresponding to the first, second and third modules of the torch nn.sequential object) and a 4 hidden layers decoder (corresponding to the 14 others modules of the torch nn.sequential object). The encoder, encode the context of the missing region into 4000 features (the ouptut of the third module)

1. Testing the context-encoder¶

Let us test the pre-loaded networks on several images.

In [4]:
--First, let us load some images
inputSize = 128
image_ctx = torch.Tensor(3, 3, inputSize, inputSize) 
input_image_ctx = torch.Tensor(3, 3, inputSize, inputSize)

--Loading an image from the paris street view dataset
local input = image.load('./context-encoder/images/paris/021_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[1]:copy(input)

--Loading an image from the imagenet dataset
local input = image.load('./context-encoder/images/imagenet/020_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[2]:copy(input)

--Loading an image from ucberkeley dataset
local input = image.load('./context-encoder/images/ucberkeley/004_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[3]:copy(input)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)
In [5]:
--Let us remove a center region from each input image
real_center = image_ctx[{{},{},{1 + inputSize/4, inputSize/2 + inputSize/4},{1 + inputSize/4, inputSize/2 + inputSize/4}}]:clone()      -- copy by value

-- fill center region with mean value
image_ctx[{{},{1},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*117.0/255.0 - 1.0
image_ctx[{{},{2},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*104.0/255.0 - 1.0
image_ctx[{{},{3},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*123.0/255.0 - 1.0
input_image_ctx:copy(image_ctx)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)
In [6]:
-- run the context-encoder trained on the paris street view data set to inpaint center
pred_center_paris = netParis:forward(input_image_ctx)
print('Prediction: size: ', pred_center_paris:size(1)..' x '..pred_center_paris:size(2) ..' x '..pred_center_paris:size(3)..' x '..pred_center_paris:size(4))

-- run the context-encoder trained on the paris street view data set to inpaint center
pred_center_imagenet = netImagenet:forward(input_image_ctx)
print('Prediction: size: ', pred_center_imagenet:size(1)..' x '..pred_center_imagenet:size(2) ..' x '..pred_center_imagenet:size(3)..' x '..pred_center_imagenet:size(4))
Out[6]:
Prediction: size: 	3 x 3 x 64 x 64	
Out[6]:
Prediction: size: 	3 x 3 x 64 x 64	

Note that the contet encoder takes as input the whole image (with the missing center region) and returns an output of the size of the center region: 3 x 64 x 64 (and not of the same size as the input, as one could expect from a auto-encoder).

In [7]:
-- Prediction from paris street view context-encoder
image_ctx = input_image_ctx:clone()
-- paste predicted center in the context
image_ctx[{{},{},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}]:copy(pred_center_paris[{{},{},{1 + 4, inputSize/2 - 4},{1 + 4, inputSize/2 - 4}}])

-- re-transform scale back to normal
image_ctx:add(1):mul(0.5)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)


-- Prediction from imagenet context-encoder

image_ctx = input_image_ctx:clone()
-- paste predicted center in the context
image_ctx[{{},{},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}]:copy(pred_center_imagenet[{{},{},{1 + 4, inputSize/2 - 4},{1 + 4, inputSize/2 - 4}}])

-- re-transform scale back to normal
image_ctx:add(1):mul(0.5)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)

Note that each context-encoder performs slightly better on images of the same kind as the one in the data set it was trained on. Hence, the first context-encoder (frst raw) performs better on paris street view images (first image) than the second context-encoder (second raw), which itself performs better on images from the imagenet dataset (second image).

In [8]:
--Load the image in the same way as before
--This time we will fill the masked region with a 0 value instead of a mean value.

--First, let us load some images
inputSize = 128
image_ctx = torch.Tensor(3, 3, inputSize, inputSize) 
input_image_ctx = torch.Tensor(3, 3, inputSize, inputSize)

--Loading an image from the paris street view dataset
local input = image.load('./context-encoder/images/paris/021_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[1]:copy(input)

--Loading an image from the imagenet dataset
local input = image.load('./context-encoder/images/imagenet/020_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[2]:copy(input)

--Loading an image from ucberkeley dataset
local input = image.load('./context-encoder/images/ucberkeley/004_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[3]:copy(input)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)

--Let us remove a center region from each input image
real_center = image_ctx[{{},{},{1 + inputSize/4, inputSize/2 + inputSize/4},{1 + inputSize/4, inputSize/2 + inputSize/4}}]:clone()      -- copy by value

-- fill center region with mean value
---this time with 0 values
image_ctx[{{},{1},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 0--2*117.0/255.0 - 1.0
image_ctx[{{},{2},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 0--2*104.0/255.0 - 1.0
image_ctx[{{},{3},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 0--2*123.0/255.0 - 1.0
input_image_ctx:copy(image_ctx)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)

-- run the context-encoder trained on the paris street view data set to inpaint center
pred_center_paris = netParis:forward(input_image_ctx)
print('Prediction: size: ', pred_center_paris:size(1)..' x '..pred_center_paris:size(2) ..' x '..pred_center_paris:size(3)..' x '..pred_center_paris:size(4))

-- run the context-encoder trained on the paris street view data set to inpaint center
pred_center_imagenet = netImagenet:forward(input_image_ctx)
print('Prediction: size: ', pred_center_imagenet:size(1)..' x '..pred_center_imagenet:size(2) ..' x '..pred_center_imagenet:size(3)..' x '..pred_center_imagenet:size(4))


-- Prediction from paris street view context-encoder
image_ctx = input_image_ctx:clone()
-- paste predicted center in the context
image_ctx[{{},{},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}]:copy(pred_center_paris[{{},{},{1 + 4, inputSize/2 - 4},{1 + 4, inputSize/2 - 4}}])

-- re-transform scale back to normal
image_ctx:add(1):mul(0.5)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)


-- Prediction from imagenet context-encoder

image_ctx = input_image_ctx:clone()
-- paste predicted center in the context
image_ctx[{{},{},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}]:copy(pred_center_imagenet[{{},{},{1 + 4, inputSize/2 - 4},{1 + 4, inputSize/2 - 4}}])

-- re-transform scale back to normal
image_ctx:add(1):mul(0.5)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 3*inputSize)
for i=1,3 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 3*256, 256)
itorch.image(viz)
Out[8]:
Prediction: size: 	3 x 3 x 64 x 64	
Out[8]:
Prediction: size: 	3 x 3 x 64 x 64	

We note that changing the color of the masked region has an impact on the context-encoder outputs (hence the latter is not independant from the masked region).

2. Inpainting on simple binary images¶

In seems that the model performs quite well on simple structures, such as straight lines and regular curves. Let us test our two context-encoders on binary images with this kind on shapes.

In [81]:
--First, let us load some images
inputSize = 128
image_ctx = torch.Tensor(7, 3, inputSize, inputSize) 
input_image_ctx = torch.Tensor(7, 3, inputSize, inputSize)
In [82]:
img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
image_ctx[1] = img
itorch.image(img)
In [83]:
img = torch.Tensor(3, inputSize, inputSize):zero() --the context encoder needs three channels as input
img[{{}, {}, {1,inputSize/2}}] = torch.ones(3, inputSize, inputSize/2)
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
image_ctx[2] = img
itorch.image(img)
In [84]:
img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {}}] = torch.zeros(3, 3, inputSize)
image_ctx[3] = img
itorch.image(img)
In [85]:
img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {inputSize/2+1, inputSize}}] = torch.zeros(3, 3, inputSize/2)
img[{{}, {inputSize/2-1, inputSize}, {inputSize/2-1, inputSize/2 +1}}] = torch.zeros(3, inputSize/2+2, 3)
image_ctx[4] = img
itorch.image(img)
In [86]:
img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {}}] = torch.zeros(3, 3, inputSize)
img[{{}, {}, {inputSize/2-1, inputSize/2 +1}}] = torch.zeros(3, inputSize, 3)
image_ctx[5] = img
itorch.image(img)
In [87]:
img = torch.Tensor(3, inputSize, inputSize):zeros(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)

img[{{1}, {inputSize/2+1, inputSize}, {inputSize/2+1, inputSize}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{2}, {inputSize/2+1, inputSize}, {1, inputSize/2}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{3}, {1, inputSize/2}, {inputSize/2+1, inputSize}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{}, {1, inputSize/2}, {1, inputSize/2}}] = torch.ones(3, inputSize/2, inputSize/2)

image_ctx[6] = img
itorch.image(img)
In [88]:
img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)

R = 2048
xc = 3*inputSize/4
yc = inputSize/2
for x=1,inputSize do
    for y=1,inputSize do
        tmp = (x-xc)*(x-xc) + (y-yc)*(y-yc)
        if tmp<R then img[{{}, {x}, {y}}] = 0 end
    end
end



image_ctx[7] = img
itorch.image(img)
In [89]:
image_ctx:mul(2):add(-1)

image_ctx[{{},{1},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*117.0/255.0 - 1.0
image_ctx[{{},{2},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*104.0/255.0 - 1.0
image_ctx[{{},{3},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*123.0/255.0 - 1.0
input_image_ctx:copy(image_ctx)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 7*inputSize)
for i=1,7 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

itorch.image(viz)
In [90]:
-- run the context-encoder trained on the paris street view data set to inpaint center
pred_center_paris = netParis:forward(input_image_ctx)

-- run the context-encoder trained on the paris street view data set to inpaint center
pred_center_imagenet = netImagenet:forward(input_image_ctx)
In [93]:
-- Prediction from paris street view context-encoder

image_ctx = input_image_ctx:clone()
-- paste predicted center in the context
image_ctx[{{},{},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}]:copy(pred_center_paris[{{},{},{1 + 4, inputSize/2 - 4},{1 + 4, inputSize/2 - 4}}])

-- re-transform scale back to normal
image_ctx:add(1):mul(0.5)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 7*inputSize)
for i=1,7 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

itorch.image(viz)



-- Prediction from imagenet context-encoder

image_ctx = input_image_ctx:clone()
-- paste predicted center in the context
image_ctx[{{},{},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}]:copy(pred_center_imagenet[{{},{},{1 + 4, inputSize/2 - 4},{1 + 4, inputSize/2 - 4}}])

-- re-transform scale back to normal
image_ctx:add(1):mul(0.5)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 7*inputSize)
for i=1,7 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

itorch.image(viz)

We note that the network trained on imagenet images performs better on these simple shapes.

3. Changing the weights at various depth¶

In [2]:
--First, let us load some images
inputSize = 128
image_ctx = torch.Tensor(4, 3, inputSize, inputSize) 
input_image_ctx = torch.Tensor(4, 3, inputSize, inputSize)

--Loading an image from the paris street view dataset
local input = image.load('./context-encoder/images/paris/021_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[1]:copy(input)

--Loading an image from the paris street view dataset
local input = image.load('./context-encoder/images/paris/005_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[2]:copy(input)

--Loading an image from the imagenet dataset
local input = image.load('./context-encoder/images/imagenet/020_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[3]:copy(input)

--Loading an image from the imagenet dataset
local input = image.load('./context-encoder/images/imagenet/005_im.png', 3, 'float')
input = image.scale(input, inputSize, inputSize)
input:mul(2):add(-1)
image_ctx[4]:copy(input)

--image_ctx:add(1):mul(0.5)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 4*inputSize)
for i=1,4 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 4*256, 256)
itorch.image(viz)

--Let us remove a center region from each input image
local real_center = image_ctx[{{},{},{1 + inputSize/4, inputSize/2 + inputSize/4},{1 + inputSize/4, inputSize/2 + inputSize/4}}]:clone()      -- copy by value

-- fill center region with mean value
image_ctx[{{},{1},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*117.0/255.0 - 1.0
image_ctx[{{},{2},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*104.0/255.0 - 1.0
image_ctx[{{},{3},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*123.0/255.0 - 1.0
input_image_ctx:copy(image_ctx)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 4*inputSize)
for i=1,4 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 4*256, 256)
itorch.image(viz)

Now let us try to analyse the importance of each layer of the context-encoder, by randomly modifying its weihgts. But first, let us have a deeper look into the context-encoder architecture:

In [11]:
function loadNet(path)
    net = torch.load(path)
    net:apply(function(m) if m.weight then 
        m.gradWeight = m.weight:clone():zero(); 
        m.gradBias = m.bias:clone():zero(); end end)
    net:evaluate()
    net:float()
    return net
end
In [12]:
function inpainting(net, input_image_ctx)
    pred_center = net:forward(input_image_ctx)
    nIm = input_image_ctx:size(1)
    -- Prediction from paris street view context-encoder
    image_ctx = input_image_ctx:clone()
    -- paste predicted center in the context
    image_ctx[{{},{},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}]:copy(pred_center[{{},{},{1 + 4, inputSize/2 - 4},{1 + 4, inputSize/2 - 4}}])

    -- re-transform scale back to normal
    image_ctx:add(1):mul(0.5)

    --return image_ctx
    
    -- vizualizing images one next to the other
    viz = torch.Tensor(3, inputSize, nIm*inputSize)
    for i=1,nIm do 
        viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
    end
    
    viz = image.scale(viz, nIm*256, 256)
    itorch.image(viz)
end
In [6]:
net = loadNet('./context-encoder/models/inpaintCenter/paris_inpaintCenter.t7')
In [7]:
inpainting(net, input_image_ctx)
Out[7]:

In [11]:
--randomize the weights of the layer m
function random_weights(m)
    local name = torch.type(m)
    if name:find('Convolution') then
        nb_kernel = m.weight:size(2)
        for i=1,nb_kernel do
            mean = m.weight[{{}, {i}, {}, {}}]:mean()
            std = m.weight[{{}, {i}, {}, {}}]:std()
            m.weight[{{}, {i}, {}, {}}]:normal(mean, std)
        end
   end
end
In [10]:
--print the results obtained when randomizing the weights of different convolutional layers 
-- (the other parameters being pre-trained)
-- assume that the network cannot have sequence in a module of a sequence in a module of a sequence
-- only work if the network has a total of 11 convolutional layers.
function randomizeWeights(path, input_image_ctx)
    net = loadNet(path)
    L = net:size()
    for i=1,L do
        m = net:get(i)
        if torch.type(m):find('Sequential') then
            subL = m:size()
            for j=1,subL do
                n = m:get(j)
                local name = torch.type(n)
                if name:find('Convolution') then
                    random_weights(n)
                    inpainting(net, input_image_ctx)
                    net = loadNet(path)
                    m = net:get(i)
                end
            end
        else 
            local name = torch.type(m)
            if name:find('Convolution') then
                random_weights(m)
                inpainting(net, input_image_ctx)
                net = loadNet(path)
            end
        end
    end
end
In [12]:
randomizeWeights('./context-encoder/models/inpaintCenter/paris_inpaintCenter.t7', input_image_ctx)
Out[12]:

In [48]:
randomizeWeights('./context-encoder/models/inpaintCenter/imagenet_inpaintCenter.t7', input_image_ctx)
Out[48]:

4. Playing with the mask¶

Let us try to add another non masked region, but of the same region as the mask.

In [9]:
--First, create again binary images
inputSize = 128
image_ctx = torch.Tensor(7, 3, inputSize, inputSize) 
input_image_ctx = torch.Tensor(7, 3, inputSize, inputSize)

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
image_ctx[1]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):zero() --the context encoder needs three channels as input
img[{{}, {}, {1,inputSize/2}}] = torch.ones(3, inputSize, inputSize/2)
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
image_ctx[2]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {}}] = torch.zeros(3, 3, inputSize)
image_ctx[3]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {inputSize/2+1, inputSize}}] = torch.zeros(3, 3, inputSize/2)
img[{{}, {inputSize/2-1, inputSize}, {inputSize/2-1, inputSize/2 +1}}] = torch.zeros(3, inputSize/2+2, 3)
image_ctx[4]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {}}] = torch.zeros(3, 3, inputSize)
img[{{}, {}, {inputSize/2-1, inputSize/2 +1}}] = torch.zeros(3, inputSize, 3)
image_ctx[5]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):zeros(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)

img[{{1}, {inputSize/2+1, inputSize}, {inputSize/2+1, inputSize}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{2}, {inputSize/2+1, inputSize}, {1, inputSize/2}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{3}, {1, inputSize/2}, {inputSize/2+1, inputSize}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{}, {1, inputSize/2}, {1, inputSize/2}}] = torch.ones(3, inputSize/2, inputSize/2)

image_ctx[6]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)

R = 2048
xc = 3*inputSize/4
yc = inputSize/2
for x=1,inputSize do
    for y=1,inputSize do
        tmp = (x-xc)*(x-xc) + (y-yc)*(y-yc)
        if tmp<R then img[{{}, {x}, {y}}] = 0 end
    end
end

image_ctx[7]:copy(img)
 


-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 7*inputSize)
for i=1,7 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 7*256, 256)
itorch.image(viz)

image_ctx:mul(2):add(-1)


-- fill a non masked region with mean value
image_ctx[{{},{1},{10 , inputSize/4},{inputSize/4, inputSize/2 + inputSize/4}}] = 2*117.0/255.0 - 1.0
image_ctx[{{},{2},{10 , inputSize/4},{inputSize/4, inputSize/2 + inputSize/4}}] = 2*104.0/255.0 - 1.0
image_ctx[{{},{3},{10 , inputSize/4},{inputSize/4, inputSize/2 + inputSize/4}}] = 2*123.0/255.0 - 1.0



--Let us add a non masked region of the same color as the mask.
image_ctx[{{},{1},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*117.0/255.0 - 1.0
image_ctx[{{},{2},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*104.0/255.0 - 1.0
image_ctx[{{},{3},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4},{1 + inputSize/4 + 4, inputSize/2 + inputSize/4 - 4}}] = 2*123.0/255.0 - 1.0




input_image_ctx:copy(image_ctx)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 7*inputSize)
for i=1,7 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 7*256, 256)
itorch.image(viz)
In [87]:
net = loadNet('./context-encoder/models/inpaintCenter/paris_inpaintCenter.t7')
In [88]:
inpainting(net, input_image_ctx)
Out[88]:

In [13]:
net = loadNet('./context-encoder/models/inpaintCenter/imagenet_inpaintCenter.t7')
inpainting(net, input_image_ctx)

We note that the modified region did not have a big impact on the result. However, when comparing with the results we obtained earlier on the non altered images, we notice that the reconstruction has been slightly influenced (especially for the angle and the cross).

Finally, let us try to feed as the whole unmasked image as input to the model.

In [90]:
--First, let us load some images
inputSize = 128
image_ctx = torch.Tensor(7, 3, inputSize, inputSize) 
input_image_ctx = torch.Tensor(7, 3, inputSize, inputSize)

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
image_ctx[1]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):zero() --the context encoder needs three channels as input
img[{{}, {}, {1,inputSize/2}}] = torch.ones(3, inputSize, inputSize/2)
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
image_ctx[2]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {}}] = torch.zeros(3, 3, inputSize)
image_ctx[3]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {inputSize/2+1, inputSize}}] = torch.zeros(3, 3, inputSize/2)
img[{{}, {inputSize/2-1, inputSize}, {inputSize/2-1, inputSize/2 +1}}] = torch.zeros(3, inputSize/2+2, 3)
image_ctx[4]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize/2-1, inputSize/2 +1}, {}}] = torch.zeros(3, 3, inputSize)
img[{{}, {}, {inputSize/2-1, inputSize/2 +1}}] = torch.zeros(3, inputSize, 3)
image_ctx[5]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):zeros(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)

img[{{1}, {inputSize/2+1, inputSize}, {inputSize/2+1, inputSize}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{2}, {inputSize/2+1, inputSize}, {1, inputSize/2}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{3}, {1, inputSize/2}, {inputSize/2+1, inputSize}}] = torch.ones(1, inputSize/2, inputSize/2)
img[{{}, {1, inputSize/2}, {1, inputSize/2}}] = torch.ones(3, inputSize/2, inputSize/2)

image_ctx[6]:copy(img)
 

img = torch.Tensor(3, inputSize, inputSize):ones(3, inputSize, inputSize) --the context encoder needs three channels as input
img[{{}, {1}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {1}}] = torch.zeros(3, inputSize)
img[{{}, {inputSize}, {}}] = torch.zeros(3, inputSize)
img[{{}, {}, {inputSize}}] = torch.zeros(3, inputSize)

R = 2048
xc = 3*inputSize/4
yc = inputSize/2
for x=1,inputSize do
    for y=1,inputSize do
        tmp = (x-xc)*(x-xc) + (y-yc)*(y-yc)
        if tmp<R then img[{{}, {x}, {y}}] = 0 end
    end
end

image_ctx[7]:copy(img)
 

image_ctx:mul(2):add(-1)

input_image_ctx:copy(image_ctx)

-- vizualizing images one next to the other
viz = torch.Tensor(3, inputSize, 7*inputSize)
for i=1,7 do 
    viz[{{},{},{(i-1)*inputSize+1, i*inputSize}}]:copy(image_ctx[i])
end

viz = image.scale(viz, 7*256, 256)
itorch.image(viz)
In [91]:
inpainting(net, input_image_ctx)
Out[91]:

This results may us wonder about the ability of the context encoder to perform inpainting with masks of random shapes, positions and sizes. Indeed, the latter seems that have quite an influence on the results of the model.